Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution

نویسندگان

  • Hyesoon Kim
  • Onur Mutlu
  • Jared Stark
  • David N. Armstrong
  • Yale N. Patt
چکیده

As processor pipelines get deeper and wider and instruction windows get larger, the branch misprediction penalty increases. Predication has been used to reduce the number of branch mispredictions by eliminating hard-to-predict branches. However, with predication, the processor is guaranteed to fetch and possibly execute useless instructions, which sometimes offsets the performance advantage of having fewer mispredictions. Also, predication does not eliminate the misprediction penalty due to backward loop branches. This paper introduces a new type of branch called a wish branch. Wish branches combine the strengths of traditional conditional branches and predication, allowing instructions to be skipped over as with traditional branches, but also avoiding pipeline flushes due to mispredictions as with predication. Unlike traditional conditional branches, on a wish branch misprediction, the pipeline does not (always) need to be flushed. And, unlike predication, a wish branch (sometimes) avoids fetching from both paths of the control flow. This paper also describes a type of wish branch instruction, called a wish loop, which reduces the branch misprediction penalty for backward loop branches. We describe the software and hardware support required to generate and utilize wish branches. We demonstrate that wish branches can decrease the execution times of SPEC2000 integer benchmarks by 7.8% (up to 21%) compared to traditional conditional branches and by 3% (up to 11%) compared to predicated execution. We also describe several simple hardware optimizations for exploiting and increasing the benefits of wish branches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicate-Based Transformations to Eliminate Control and Data-Irrelevant Cache Misses

The performance of modern processors is increasingly dependent on their ability to execute multiple instructions per cycle. Explicitly Parallel Instruction Computing (EPIC) architectures can achieve high performance by using the compiler to express program instruction level parallelism (ILP) directly to the hardware. The predicated execution feature is critical to the success of the EPIC archit...

متن کامل

Improving Branch Predictors by Combining with Predicated Execution

This paper deals with superscalar processors, which are capable of executing several instructions per clock cycle. Superscalar processors may be considered as the most promising uniprocessor architectures of the post RISC era. Although superscalar processors can be viewed as an evolution of the RISC architectures, they are subject to many more trade-offs than simply the pipeline depth. Executin...

متن کامل

A Comparison of Full and Partial Predicated Execution

One can eeectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential beneets of predicated execution are high, the tradeoos involved in the design of an instruction set to support predicated execution can be diicult. On one end of the design spectrum, architectural support for full pred-icated execution requires increasing t...

متن کامل

ISCA - 22 , Jun 1995 1 A Comparison of Full and Partial Predicated Execution Supportfor ILP

One can eeectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential beneets of predicated execution are high, the tradeoos involved in the design of an instruction set to support predicated execution can be diicult. On one end of the design spectrum, architectural support for full pred-icated execution requires increasing t...

متن کامل

Hierarchical Control Prediction: Support for Aggressive Predication

Predication of control edges has the potential advantages of improving fetch bandwidth and reducing branch mispredictions. However, heavily predicated code in out-of-order processors can lose significant performance by deferring resolution of the predicates until they are executed, whereas in nonpredicated code those control arcs would have remained as branches, and would be resolved immediatel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005